Inferring Causal Impact Using Bayesian Structural Time-Series Models

Filip Reierson

2023

Introduction

Introduction

Introduction

Introduction

Introduction

Potential outcomes

  • Let \(D\) be a treatment indicator.
  • \(Y(1)\) is the outcome that would have been observed if treated.
  • \(Y(0)\) is the outcome that would have been observed if untreated.
Potential outcome D=1 D=0
\(Y(1)\) \(Y\) missing
\(Y(0)\) missing \(Y\)

Average Treatment Effect on treated

\[\tau_{\mathrm{att}} = \mathbb{E} [Y(1)-Y(0) | D = 1]\]

If we assume \(Y(0) \perp D\), this enables us to use \(\mathbb{E} [Y(0) | D = 0]\) in the place of \(\mathbb{E} [Y(0) | D = 1]\). In panel data setting other methods based on regression are available, e.g., propensity score matching.

Some background: Abadie

  • Synthetic control.
  • Covariates made to resemble the treated series.
  • Applied to effect of terrorist activity on GDP in the Basque country.
  • Applied to California tobacco control program.

Abadie and Gardeazabal (2003) Abadie, Diamond, and Hainmueller (2010)

Brodersen et al. (2015)

The paper I have chosen to summarise is “INFERRING CAUSAL IMPACT USING BAYESIAN STRUCTURAL TIME-SERIES MODELS” by Brodersen et al. (2015).

  • Model uncertainty in control selection using Bayesian approach.
  • Use predictors instead of units comparable to the treated series.
  • Demonstrates on Google data. Google trends can be used in place of a control.
  • Application to simulated data which I attempt to reproduce.
  • CausalImpact package.

The model

In the most general sense Brodersen suggests using a Bayesian structural time-series model. This can be represented as,

\[ \begin{split} y_t &= Z_t^\intercal \alpha_t + \epsilon_t\\ \alpha_{t+1} &= T_t \alpha_t + R_t \eta_t. \end{split} \]

  • observation equation
  • state equation

The model - example

  • one predictor \(x_t\)
  • dynamic coefficient
  • dynamic trend
  • spike and slab prior

\[ y_t = \left[ \begin{matrix} 1 & 0 & x_t\\ \end{matrix} \right] \left[\begin{matrix} \mu_{t+1}\\ \delta_{t+1}\\ \beta_{t+1} \end{matrix}\right] + \epsilon_t, \]

\[ \left[\begin{matrix} \mu_{t+1}\\ \delta_{t+1}\\ \beta_{t+1} \end{matrix}\right] = \left[\begin{matrix} 1 & 0 & 1\\ 0 & 0 & 1\\ 0 & 1 & 0 \end{matrix}\right] \left[\begin{matrix} \mu_{t}\\ \beta_{t}\\ \delta_{t} \end{matrix}\right] + \left[\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0\\ 0 & 0 & 1 \end{matrix} \right] \left[\begin{matrix} \eta_{\mu,t}\\ \eta_{\delta,t}\\ \eta_{\beta,t} \end{matrix}\right]. \]

The prior - spike and slab prior

\[ p(\varrho,\beta,1/\sigma_\epsilon^2)=p(\varrho)p(\sigma_\epsilon^2|\varrho)p(\beta_\varrho|\varrho,\sigma_\epsilon^2) \]

\[\varrho\sim\text{Bernoulli}(.)\]

\[(\beta_\varrho|\sigma_\epsilon^2,\varrho=1)\sim\mathcal{N}(.)\]

\[(1/\sigma_\epsilon^2|\varrho=1) \sim \mathcal{G}(.)\]

The prior - spike and slab prior

\[ p(\varrho,\beta,1/\sigma_\epsilon^2)=p(\varrho)p(\sigma_\epsilon^2|\varrho)p(\beta_\varrho|\varrho,\sigma_\epsilon^2) \]

\[\varrho\sim\text{Bernoulli}(.)\]

  • expected model size M
  • J coefficients
  • M/J

\[(\beta_\varrho|\sigma_\epsilon^2,\varrho=1)\sim\mathcal{N}(.)\]

  • \(R^2\)
  • weight to prior (#obs)

\[(1/\sigma_\epsilon^2|\varrho=1) \sim \mathcal{G}(.)\] Zellner’s g-prior

Simulation

\[ \begin{split} y_t &= \beta_{t,1}x_{t,1}+\beta_{t,2}x_{t,2}+\mu_t+\epsilon_t\\ \beta_{t,i} &\sim \mathcal{N}(\beta_{t-1,i},0.01^2);\quad \beta_{0,i}=0; \quad i \in \{1,2\}\\ \mu_t &\sim \mathcal{N}(\mu_{t-1},0.1^2);\quad\mu_0=20\\ \epsilon_t &\sim \mathcal{N}(0,0.1^2). \end{split} \]

Simulation - intervention

Brodersen et al. (2015) applies a multiplicative factor to imitate a causal effect so that the final observations are given by \(y^*_t=y_t \mathbb{I}\{t<366\}+y_t(1+e)(1-\mathbb{I}\{t<366\})\), where \(\mathbb{I}\{f(t)\}\) is the indicator function that evaluates to 1 when \(t \in \{w : f(w)\}\) and 0 otherwise.

Simulation - intervention

Figure 1: Examples of how Brodersen et al. (2015) applies a multiplicative factor in simulations to imitate the effect of a sustained intervention effect.

Simulation settings

  • 256 times each
  • \(e\in\{0,0.001,0.01,0.05, 0.1, 0.25 ,1\}\)
  • \(e=0.1\) and \(\max T - 366 \in \{30,60,90,120,150,180\}\)

Simulation - example

Figure 2: An example simulation realisation with effect size 0.1 and intervention, e.g. ad campaign, lasting 180 days.

Simulation - example

Figure 3: (A) Using the same simulation realisation as in Figure 2, the response is compared to a retrospectively forecasted counterfactual, \(\hat{Y}_t(0)\), along with a 95% credibility interval which gets wider as the time since the start of the campaign increases. (B) The point estimate causal effect based on the observed response less the forecasted counterfactual. Dashed line shows ground truth (by construction). (C) The estimated cumulative causal effect defined as the sum of point causal effects.

Simulation - power curve / coverage

Figure 4: (A) The empirical prevalence of credibility intervals that exclude zero in the positive direction, i.e., a power curve, for the simulation setting with a 180 day intervention period. (B) The empirical coverage of causal effect credibility interval for simulations of different campaign durations. Frequentist 95% confidence interval estimated by \(\hat{p} \pm 1.96 \sqrt{\hat{p} (1-\hat{p}) / n}\). Bayesian 95% credibility interval determined with a uniform(0,1) prior. The point estimate in the frequentist case is the MLE \(\hat p\) and in the Bayesian case the posterior expectation.

Simulation - structural change

\[ \beta_{t,i} \sim \begin{cases} \mathcal{N}(\beta_{t-1,i},\ 0.01^2) &\text{if } t < 366+90\\ \mathcal{N}(\beta_{t-1,i},\ c^2) &\text{otherwise}. \end{cases} \]

  • 64 times each
  • \(c\in\{0.01,0.1,0.3\}\)

Simulation - structural change

Figure 5: The absolute percentage error for pointwise treatment effect compared to the constructed true effect for a structural change in the coefficient dynamics. Vertical line indicates structural change. Shaded 95% confidence intervals estimated by \(\hat\mu\pm 1.96\hat\sigma/\sqrt{64}\) at each time point, where \(\hat\mu\) is the sample mean absolute % error, and \(\hat\sigma\) is the sample standard deviation of the mean absolute % error. Note that a new SD of 0.01 corresponds to no structural change.

Reproducibility

  • set.seed()
  • simulation output stored in .Rds for convenience (simulation took \(\approx\) 4h)
  • sessionInfo.txt
  • quarto to make report reproducible
  • github

Conclusion

  • Brodersen argues that Bayesian structural time series models are appropriate to assess causality.
    • His argument from empirical and simulated data hold, but falls short in terms of attention to detail.
  • Brodersen fails to argue using generally accepted causal language -> I formalise somewhat
  • Simulations falls short of reproducible -> I made reproducible

References

Abadie, Alberto, Alexis Diamond, and And Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Article. Journal of the American Statistical Association 105 (490): 493–505. https://doi.org/10.1198/jasa.2009.ap08746.
Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A Case Study of the Basque Country.” Article. American Economic Review 93 (1): 113–32. https://doi.org/10.1257/000282803321455188.
Brodersen, Kay H, Fabian Gallusser, Jim Koehler, Nicolas Remy, and Steven L Scott. 2015. “Inferring Causal Impact Using Bayesian Structural Time-Series Models.” The Annals of Applied Statistics, 247–74.